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SEQ 10 
NO: 


PFAM NAME 


DESCRIPTION 


p -value 


PPAM 
SCORE 




t 








429 


Zf-C3HC4 


Zinc finger, C3HC4. type (RING 
finger) 


8.6e-ll 


39.2 


431 


DEAD 


DEAD/DBAH box helicase 


le-66 


514.6 


432 


SH3 


SH3 domain 




67.2 


433 


GTP_CDC 


Cell division protein 


2.1a-114 


393.5 " 


436 


Collagen 


Collagen triple helix repeat 
(20 copies) 


4 . 6e-194 


658 . 1 


438 


Ricin_B__lect 
in 


Similarity to lectin domain of 
ricin b 


0 .0085 


10,5 


441 


Alpha_adapti 
n C 


Alpha adaptin carboxyl- terminal 
domai 


1.2e-256 


866. 0 


442 


Aipha_adapti 
n C 


Alpha adaptin carboxyl- terminal 
domai 


1.8e-235 


795.7 


443 


PDZ 


PDZ domain (Also known as DHR 
or GLGF) . 


1. 9e-65 


230 . 9 


44S 


LON 


ATP-dependent protease La (LON) 
domain 


0* 00012 


-17.1 


446 


ig 


immunoglobulin domain 


0 . 00011 


20 .1 


,4S1 


sushi 


Sushi domain (SCR repeat) 


1.46-18 


75.2 


452 


fn3 


Fibronectin type III domain 


1 .5e-06 


35,2 


454 


pyridoxal_de 
C ~ 


Pyridoxal - dependent 
decarboxylase conse 


8.3e-14 


50.3 


456 


kinesin 


Kinesin motor domain 


4.96-217 


734.4 


457 


neur_chan 


Neurotransmitter-gated ion- 
channel 


le-175 


597.1 


458 


Josephin 


Josephin 


0.0002 


18.7 


468 


bZIP 


bZIP transcription factor 


1.7a-07 " 


31.8 


470 


NTP_ trangf er 
ase 


Nucleotidyl transferase 


6.3e-06 


-26.3 


471 


WD40 


WD domain, G-beta repeat 


2e-2B 


107,9 


473 


LIM 


LIM domain containing proteins 


0.00021 


20.7 


477 


zf-KanBP 


Zn- finger in Ran binding 
protein and others. 


0.02B 


21.0 


479 


WD40 


WD domain, G-beta repeat 


6.5e-18 


73.0 


480 


KRAB 


KRAB box 


le-31 


118.8 


481 


ArfGap 


Putative GTP-ase activating 
protein for Arf 


8.4e-6£ 


232.0 


485 


SH2 


Src homology domain 2 


0.011 


11 .4 


486 


Clq 


Clq domain 


4.3e-74 


259.6 


487 


dsrm 


Double- stranded RNA binding 
motif 


l.le-47 


171 . 9 


489 


af-C2H2 


Zinc finger, C2H2 type 


4.8e-l53 


521. 9 


490 


Alpha^adapti 
n C 


Alpha adaptin carboxyl- terminal 
domai 


3.4e-222 


751 . £ 


492 


SKI 


Shikimate kinase 


1.2e-10 


48.8 


497 


BNVjpolyprot 
eln 


JENV polyprotein (coat 
polyprotoin) 


2 .6e-22 


77 .6 


498 


abhydrol ase_ 
2 


Phospholipase/ Carboxyl es te rase 


0-041 


.A ft 1 


500 


rrro 


UNA recognition wiotif. 


5.4e-34 


126.4 


501 


Wf 


ffVftil Jl £ M 

WW aontain 


4 .6e-18 




502 


ig 


Immunoglobulin domain 


l.le-io 


39.5 


504 


abhydrolase 


alpha/beta hydrolase fold 


0 . 045 


y% p — 


505 


vva 


von Willebrand factor type A 
domai h 


7.1e~62 


219.0 


508 


Na_KATPase_ 
C 


Na+/K+ ATPase C- terminus 


2.3e-145 


496.3 


509 


Exonuclease 


Exonuclease 


1.3e-5S 


201. S 


510 


Glycos trans 
f 1 


Glycosyl trans f erases group 1 


2,9e-06 


27.0 


511 


Glycos trans 
f l ~ 


Glycoayl transferases group 1 


2.9G-06 


27.0 


512 


Glycos trans 


Glycosyl transferases group 1 


1.9e-09 


38.5 


514 


pro_isomeras 
e 


Cyclophilin type peptidyl- 
prolyl cis-tr 


1.8e-63 


221.4 



251 
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Polynucleotides encoding preferred polypeptide truncations of the invention can be used 
to generate polynucleotides encoding chimeric or fusion proteins comprising one or more 
domains of the invention and heterologous protein sequences. 

The polynucleotides of the invention additionally include the complement of any of the 
5 polynucleotides recited above. The polynucleotide can be DNA (genomic, cDNA, amplified, or 
synthetic) or RNA. Methods and algorithms for obtaining such polynucleotides are well known 
to those of skill in the art and can include, for example, methods for determining hybridization 
conditions that can routinely isolate polynucleotides of the desired sequence identities. 

In accordance with the invention, polynucleotide sequences comprising the mature 

10 protein coding sequences corresponding to any one of SEQ 1DN0:1-1786 and 3573-5358, or 
functional equivalents thereof, may be used to generate recombinant DNA molecules that direct 
the expression of that nucleic acid, or a functional equivalent thereof, in appropriate host cells. 
Also included are the cDNA inserts of any of the clones identified herein. 

A polynucleotide according to the invention can be joined to any of a variety of other 

1 5 nucleotide sequences by well-established recombinant DNA techniques (see Sambrook J et aL 
(1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, NY). Useful 
nucleotide sequences for joining to polynucleotides include an assortment of vectors, e.g., 
plasmids, cosmids, lambda phage derivatives, phagemids, and the like, that are well known in the 
art. Accordingly, the invention also provides a vector including a polynucleotide of the 

20 invention and a host cell containing the polynucleotide. In general, the vector contains an origin 
of replication functional in at least one organism, convenient restriction endonuclease sites, and a 
selectable marker for the host cell. Vectors according to the invention include expression 
vectors, replication vectors, probe generation vectors, and sequencing vectors. A host cell 
according to the invention can be a prokaryotic or eukaryotic cell and can be a unicellular 

25 organism or part of a multicellular organism. 

The present invention farther provides recombinant constructs comprising a nucleic acid 
having any of the nucleotide sequences of SEQ ID NO: 1 -1 786 and 3573-5358 or a fragment 
thereof or any other polynucleotides of the invention. In one embodiment, the recombinant 
constructs of the present invention comprise a vector, such as a plasmid or viral vector, into 

30 which a nucleic acid having any of the nucleotide sequences of SEQ ID NO:l-1786 and 3573- 
5358 or a fragment thereof is inserted, in a forward or reverse orientation. In the case of a vector 
comprising one of the ORFs of the present invention, the vector may further comprise regulatory 
sequences, including for example, a promoter, operably linked to the ORR Large numbers of 
suitable vectors and promoters are known to those of skill in the art and are commercially 

35 available for generating the recombinant constructs of the present invention. The following 
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vectors are provided by way of example. Bacterial: pBs, phagescript, PsiX174, pBluescript SK, 

pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233~3, 

pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat 7 pOG44, PXTl, pSG (Stratagene) 

pSVK3, pBPV, pMSG, pSVL (Pharmacia). 

5 The isolated polynucleotide of the invention may be operably linked to an expression 

control sequence such as the pMT2 or pED expression vectors disclosed in Kaufman et aL, 

Nucleic Acids Res. 19, 4485-4490 (1991), in order to produce the protein recombinantly. Many 

suitable expression control sequences are known in the art. General methods of expressing 

recombinant proteins are also known and are exemplified in R, Kaufman, Methods in 

10 Enzymology 185, 537-566 (1990). As defined herein "operably linked 11 means that the isolated 
polynucleotide of the invention and an expression control sequence are situated within a vector 
or cell in such a way that the protein is expressed by a host cell which has been transformed 
(transfected) with the ligated polynucleotide/expression control sequence. 

Promoter regions can be selected from any desired gene using CAT (chloramphenicol 

1 5 transferase) vectors or other vectors with selectable markers. Two appropriate vectors are 
pKK232-8 and pCM7. Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, 
lambda PR, and trc. Eukaryotic promoters include CM V immediate early, HSV thymidine 
kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-L Selection of 
the appropriate vector and promoter is well within the level of ordinary skill in the art. 

20 Generally, recombinant expression vectors will include origins of replication and selectable 

markers permitting transformation of the host cell, e.g. , the ampicillin resistance gene of E. coli 
and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct 
transcription of a downstream structural sequence. Such promoters can be derived from operons 
encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), a-factor, acid 

25 phosphatase, or heat shock proteins, among others. The heterologous structural sequence is 
assembled in appropriate phase with translation initiation and termination sequences* and 
preferably, a leader sequence capable of directing secretion of translated protein into the 
periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a 
fusion protein including an amino terminal identification peptide imparting desired 

30 characteristics, e.g. , stabilization or simplified purification of expressed recombinant product. 
Useful expression vectors for bacterial use are constructed by inserting a structural DNA 
sequence encoding a desired protein together with suitable translation initiation and termination 
signals in operable reading phase with a functional promoter. The vector will comprise one or 
more phenotypic selectable markers and an origin of replication to ensure maintenance of the 

35 vector and to, if desirable, provide amplification within the host. Suitable prokaryotic hosts for 
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transformation include E. colt, Bacillus subtilis, Salmonella typhimurium and various species 
within the genera Pseudomonas, Streptomyces y and Staphylococcus, although others may also be 
employed as a matter of choice. 

As a representative but non-limiting example, useful expression vectors for bacterial use 
can comprise a selectable marker and bacterial origin of replication derived from commercially 
available plasmids comprising genetic elements of the well known cloning vector pBR322 
(ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia Fine 
Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotech, Madison, WI, USA), These 
pBR322 "backbone" sections are combined with an appropriate promoter and the structural 
sequence to be expressed Following transformation of a suitable host strain and growth of the 
host strain to an appropriate cell density, the selected promoter is induced or derepressed by 
appropriate means (e.g. y temperature shift or chemical induction) and cells are cultured for an 
additional period. Cells are typically harvested by centrifugation, disrupted by physical or 
chemical means, and the resulting crude extract retained for further purification. 

Polynucleotides of the invention can also be used to induce immune responses. For 
example, as described in Fan et aL, Nat. Biotech. 17:870-872 (1999), incoiporated herein by 
reference, nucleic acid sequences encoding a polypeptide may be used to generate antibodies 
against the encoded polypeptide following topical administration of naked plasmid DNA or 
following injection, and preferably intramuscular injection of the DNA. The nucleic acid 
sequences are preferably inserted in a recombinant expression vector and may be in the form of 
naked DNA. 

4.3 ANTISENSE 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules that 
are hybridizable to or complementary to the nucleic acid molecule comprising the nucleotide 
sequence of SEQ ID NO:M786 and 3573-5358, or fragments, analogs or derivatives thereof. 
An "antisense" nucleic acid comprises a nucleotide sequence that is complementary to a "sense" 
nucleic acid encoding a protein, e.g., complementary to the coding strand of a double-stranded 
cDNA molecule or complementary to an mRNA sequence. In specific aspects, antisense nucleic 
acid molecules are provided that comprise a sequence complementary to at least about 10, 25, 
50> 100, 250 or 500 nucleotides or an entire coding strand, or to only a portion thereof. Nucleic 
acid molecules encoding fragments, homologs, derivatives and analogs of a protein of any of 
SEQ ID NO:1787-3572 and 5359-7144 or antisense nucleic acids complementary to a nucleic 
acid sequence of SEQ ID NO:l-1786 and 3573-5358 are additionally provided. 



20 



